Skip to content

Pr audit test hardening#27

Merged
MrChengLen merged 2 commits into
mainfrom
pr-audit-test-hardening
May 8, 2026
Merged

Pr audit test hardening#27
MrChengLen merged 2 commits into
mainfrom
pr-audit-test-hardening

Conversation

@MrChengLen
Copy link
Copy Markdown
Owner

No description provided.

MrChengLen and others added 2 commits May 8, 2026 22:56
…oarding

Three connected fixes addressing the post-audit Doc-A findings (H5
api-reference.md staleness, M2 MAX_UPLOAD inconsistency) and the
PM-Agent's R3 (.env.example structure). Aimed at making the
Self-Hoster's first 30 minutes work without trial-and-error — a
Technology-First / developer-discovery investment.

1. app/core/config.py — MAX_UPLOAD_SIZE_MB default 2000 → 100 MB

The .env.example, api-reference.md, and self-hosting.md all said the
default was 100 MB; the code default was 2000 MB (2 GB). With zero
real users yet, the canonical value can move freely. 100 MB is sane
for unconfigured Self-Hosters (avoids OOM-by-default), matches the
quota tiers in app/core/quotas.py for the anonymous tier, and tracks
the docs that were already published. Operators with bigger payloads
override via env-var.

2. .env.example — sectioned by deployment edition

Reorganised into four labelled sections so a self-hoster reading
top-down hits only the keys their deployment needs:
  - Required for every deployment (host/port, API_KEYS_FILE,
    MAX_UPLOAD, CORS, APP_BASE_URL, optional API_BASE_URL split)
  - Cloud-overlay (JWT_SECRET, DATABASE_URL, Stripe, SMTP,
    PRICING_PAGE_ENABLED) — empty values keep features off
  - Compliance-Edition tunables (AUDIT_FAIL_CLOSED, RETENTION_HOURS)
  - Operational knobs (METRICS_ENABLED, sweep cadence,
    concurrency cap)

Variables that were missing from the example (JWT_SECRET, the Stripe
keys, SMTP fields) are now visible as commented-out entries with
purpose notes — a Self-Hoster who wants to enable accounts can see
the exact set of env-vars to set without grep-hunting through code.

3. docs/api-reference.md — append, do not rewrite

Existing single-file structure preserved. Added:
  - Authentication: explicit two-scheme table (X-API-Key for
    Community / scripts; Authorization: Bearer for Cloud overlay).
    Login / refresh examples for the JWT path. Token placeholder
    syntax (<access-token>) chosen so static-analysis tools don't
    mis-flag the example as a leaked secret.
  - Cloud-Edition endpoints summary: /api/v1/auth/*, /api/v1/keys,
    /api/v1/billing/* — each as a one-line entry with auth
    requirement and purpose. Avoids re-documenting schema; defers to
    the auto-generated Swagger UI at /docs for request bodies.
  - Batch endpoints: /api/v1/convert/batch + /api/v1/compress/batch
    with their multipart shape and 200/422 semantics.
  - Response Headers section: X-Output-SHA256 (every conversion),
    X-Data-Classification (BSI taxonomy echo), X-FileMorph-Achieved-
    Bytes / X-FileMorph-Final-Quality (target_size_kb path),
    Retry-After (503 path).
  - Error Responses: added 403, 415, 503 with semantic notes.
  - Rate Limiting table now includes /ready and the billing
    endpoints.

4. .githooks/pre-{commit,push} — allow .env.example

The hook's SECRET_ASSIGN regex correctly catches lines like
`JWT_SECRET=...`, but `.env.example` is by definition the place to
show those keys with placeholder values for self-hosters. Added
`\.env\.example` to ALLOW_RE so legitimate documentation updates
to that file aren't blocked.

Verified: 473 tests passing, ruff clean, drift-check unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…6/M7/M9/M10

Phase 3 of the post-audit remediation plan (logical-beaming-brooks.md).
Pure regression coverage — zero behaviour change in production code. The
codebase already satisfies every assertion; this PR keeps it that way.

Findings closed
---------------
H3 — tests/test_billing_consent.py
  Two new tests pin the SHA-256 hash-chain end-to-end:
    test_audit_event_chain_intact_across_two_writes asserts verify_chain
    returns None after two real /billing/checkout writes. Catches a
    regression where a future refactor switches the canonical-JSON
    serialiser, the hashing primitive, or the chaining order — events
    would still record, but verify_chain would no longer detect tamper.
    test_audit_event_chain_detects_payload_tampering mutates one row's
    payload_json after-the-fact and asserts verify_chain returns that
    row's id. Pins the property that record_hash binds the payload.
  Without these guards, a silent break in dispute reproducibility (BORA
  §50, BeurkG §39a, ISO 27001 A.12.4.1) would only surface at audit
  time.

H4 — tests/test_hook_allowlist_regression.py (NEW)
  60 parametrized cases across the three regexes shared by
  .githooks/pre-commit and .githooks/pre-push (ALLOW_RE,
  FORBIDDEN_PATHS, INTERNAL_PATHS). Pins:
    - 17 paths that MUST be allowed (locale/*.po, address-bearing
      legal templates, public DPA template, .env.example, ...).
    - 4 application files that must NOT be allowed (content-pattern
      scans must run on app code).
    - 8 ops-only paths that must be FORBIDDEN (compose.prod.yml,
      deploy.sh, runbooks/, docs-internal/, root CLAUDE.md, ...).
    - 14 internal-doc paths that must redirect to docs-internal/
      (admin-cockpit, email-setup, marketing-plan, ...).
    - drift-check that pre-commit and pre-push regexes stay
      identical (otherwise --no-verify defeats the local hook AND
      the pre-push backstop scans different rules).
  Without this guard, dropping `locale/.*` from ALLOW_RE silently
  blocks every i18n update on every developer's machine — the
  developer blames their content, not the regex.

M10 — tests/test_billing_consent.py (existing tests amended)
  test_checkout_*_with_acknowledgement_records_audit_event now pin
  rows[0].actor_ip == "testclient" (TestClient default client host).
  Without this, a future commit dropping `request.client.host` from
  the audit-event recorder would still pass — but Compliance Edition
  customers would lose dispute reproducibility (no IP attribution).

M6 — tests/test_public_pages_reachability.py
  test_enterprise_de_renders_authoritative_german now pins
  <html lang="de" (locale resolution) AND `DSGVO or Behörden`
  (DSGVO is the German GDPR label, untranslatable in EN). Either
  drift independently breaks the test — copy edit "Behörden" →
  "Verwaltung" no longer slips through silently.

M7 — tests/test_public_pages_reachability.py
  test_impressum_en_has_preamble_then_german now asserts
  text.index(preamble) < text.index("Verantwortlich"). A template
  inversion (DE body above EN preamble) would still satisfy a
  presence-only check but breaks the document's purpose.

M9 — tests/test_i18n.py
  Four new parametrized assertions on /de/<page> (privacy, terms,
  impressum, security) that pin a stable DE-only marker per page.
  The 200-status smoke above passes even when messages.mo is missing,
  corrupt, or out-of-sync — gettext silently falls back to the EN
  msgid. With this guard, a corrupt catalog surfaces as a hard
  failure rather than silent regression to English.

Verification
------------
  pytest tests/test_i18n.py tests/test_public_pages_reachability.py
         tests/test_billing_consent.py tests/test_hook_allowlist_regression.py
  → 115 passed
  pytest tests/  → 539 passed, 15 skipped (no regressions)
  ruff check + ruff format --check  → clean

Out of scope (deferred to follow-up)
------------------------------------
  - L4 — /de/dashboard auth-gated content assertion.
  - L5 — drop-zone hidden initial state assertion.
  - Phase 2 doc fixes (M1 Caddyfile syntax, M3 UFW order, H6
    docs/email-setup.md decision) — separate PR.
@MrChengLen MrChengLen merged commit d484744 into main May 8, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant